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Abstract: Reverberant sound fields are often modeled as isotropic. How¬ 
ever, it has been observed that spatial properties change during the decay of 
the sound field energy, due to non-isotropic attenuation in non-ideal rooms. 

In this letter, a model for the spatial coherence between two sensors in a 
decaying reverberant sound field is developed for rectangular rooms. The 
modeled coherence function depends on room dimensions, surface reflectiv¬ 
ity and orientation of the sensor pair, but is independent of the position of 
source and sensors in the room. The model includes the spherically isotropic 
(diffuse) and cylindrically isotropic sound field models as special cases. 
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1. Introduction 

A reverberant sound field is generated by the reflections of an excitation signal at the boundaries 
of an enclosed environment. Common models for reverberant sound fields are isotropic sound 
field models which exhibit directionally invariant properties, with spherically isotropic (diffuse) 
or cylindrically isotropic sound fields being two well-known special cases (Cook et ah, 1955). 
At any point within a spherically isotropic sound field, uncorrelated signals arrive from all 
directions in the three-dimensional space, whereas in a cylindrically isotropic sound field, signals 
arrive only from directions contained in a two-dimensional plane. The use of an isotropic sound 
field to approximate a reverberant sound field can be justified as, after sufficient time, any signal 
originating from a single point in a reverberant environment will have many reflection paths 
to any other point within the sound field. These reflection paths will reach the point within 
the field from many different directions and will have high temporal densities. Additionally, if 
the source signal is also assumed to have limited temporal correlation, reflected signals arriving 
at the receiver can be assumed to be uncorrelated if the observation window length is limited 
(Jacobsen and Roisin, 2000). 

Isotropic sound field models are widely used to approximate reverberant sound fields 
in real rooms, e.g., for application to acoustic signal enhancement. However, rooms with strong 
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variations between the acoustic properties of different surfaces, e.g., due to the use of curtains 
or acoustic ceiling tiles, can create a highly non-isotropic sound field with temporally varying 
spatial characteristics (Cover et ah, 2004). 

In this letter, we investigate the temporal evolution of the spatial coherence function 
between two sensors in a decaying reverberant sound field in a rectangular room. Spatial co¬ 
herence functions are an important measure for the properties of a sound field, and relevant 
for various signal processing applications, e.g., dereverberation by spatial postfiltering methods 
(McGowan and Bourlard, 2003; Schwarz and Kellermann, 2015). First, the well-known spheri¬ 
cally isotropic (diffuse) and cylindrically isotropic models for idealized stationary sound fields 
are briefly reviewed. Then, we develop a specific model for the spatial coherence function of 
the short-time stationary decaying reverberant sound field in rectangular rooms. The spatial 
coherence function of the decaying sound field is found to be non-isotropic, time-variant, real¬ 
valued, and independent of the position of source and sensors within the room. The practical 
applicability of this model is finally evaluated by comparison with coherence estimates obtained 
from simulated and measured room impulse responses. 


2. Models for the spatial coherence of reverberant sound fields 

We consider a pair of omnidirectional microphones separated by a distance d. The sound field 
is assumed to be stationary. The spatial coherence function can be expressed as ri2(A:) — 
where k is the wavenumber, $ii(A:), 4>22(fc) are the auto-power spectral densities 

(PSDs) of the two microphones signals, and $i 2 (fc) is the cross-power spectral density (CPSD) 
between the two signals. 



Fig. 1. Spherical coordinates in rectangular environment 


2.1. Spherically isotropic (diffuse) sound field model 

In a spherically isotropic (diffuse) sound field, sound waves arrive uniformly from all directions 
in the 3-dimensional space. The PSD and CPSD can be calculated via surface integrals, where 
each point on the surface represents a direction from which a far-held signal is arriving at the 
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microphone pair (Cook et ah, 1955). With the spherical coordinates defined as shown in Fig. 1, 
and the microphone pair aligned with the x axis, a signal arriving at the microphone pair from 
a given direction (0, (/>) will contribute to the CPSD with the phase term Q-3kdcos{9) ^ obtain 
the CPSD of a spherically isotropic sound field, this contribution can be inserted into a surface 
integral over a sphere. A surface integral over the PSD contributions yields the PSD, which 
is equal for both microphones (<I>ii(fc) = $ 22 (fc)) since no obstructions are present and both 
microphones are assumed to be omnidirectional, and the spatial coherence function is obtained 
as (Cook et ah, 1955): 


ri2(fc) 


fn^ fn sin(6»)d6»d(() 

/o"/o"sin(0)dM<(. 


sinc(kd). 


( 1 ) 


2.2. Cylindrically isotropic sound field model 

For the case of a cylindrically isotropic sound field the coherence can be calculated similarly to 
that of a spherically isotropic field. However, it is now assumed that the sound waves only travel 
in the horizontal (x-y) plane, and the integration is therefore performed over the circumference 
of a circle in the x-y-plane, resulting in the zeroth-order Bessel function of the first kind (Cook 
et ah, 1955): 


ri2(fc) 


^-Jkdcosie)p^Q 


Jo{kd). 


( 2 ) 


This model was found to be a good approximation for reverberant sound fields in environments 
with strong attenuation along an axis perpendicular to the axis of the microphone pair, e.g., 
rooms with strongly absorbing floor and ceiling (Elko, 2000; Schwarz and Kellermann, 2015). 


2.3. Proposed model for rectangular rooms 

While the spherically and cylindrically isotropic sound field models are often used to approxi¬ 
mate the spatial coherence function of reverberant sound fields, real rooms rarely have isotropic 
properties. In practice, different reflection coefficients of surfaces create a non-isotropic rever¬ 
berant sound field, with spatial properties which change during the decay of the sound field 
energy. We now develop a model for the time-varying spatial coherence of a sound field which is 
decaying from an initial impulse-like excitation at t = 0. Unlike for the diffuse and cylindrically 
isotropic sound field models, which assume stationarity, the sound field is now only assumed 
to be short-time stationary, i.e., stationary within a small interval around the time t, with a 
spatial coherence Ti 2 {k,t) which is dependent on the time t after the initial excitation. 

In order to model sound field behavior within a rectangular room characterized by 
its dimensions and surface reflection coefficients, the ray-based sound model from geometrical 
room acoustics is employed. This model treats a sound field as being composed of incoherent 
rays which propagate only in straight lines and undergo no diffusion or interference during 
propagation and reflection (Kuttruff, 2000). Note that, although we neglect interference, we 
do consider the phase relations of a ray between the two microphones. Rectangular geometries 
along with the ray model have been previously applied to study the energy decay of sound 
fields, e.g., Sakuma (2012); Hirata (1979). However, to the authors’ best knowledge, models for 
the spatial coherence of decaying sound fields in rectangular rooms have not been published. 
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The initial state of the sound field is modeled as spherically isotropic. This can be 
justified as, while the distribution of rays in a sound field is initially dependent on the source 
location, it quickly becomes spatially dispersed via reflections, obscuring that it originated from 
a single point. The time until this state can be assumed is known as the mixing time and is 
typically assumed to be reached once 10 reflected signals are incident within a 24 ms interval 
(Polack, 1988; Jot et ah, 1997). While it is dependent on room size, typical numbers for the 
mixing time are on the order of several 10 ms. 

While the sound field initially trends towards isotropy, it has been observed that 
isotropy is lost in the long term, since components along directions with stronger attenua¬ 
tion decay more rapidly (Cover et ah, 2004). This effect will be covered by the model proposed 
in the following. 

A single sound ray arriving at a microphone pair has a velocity c, equal to the speed 
of sound, and a direction of propagation defined in the spherical coordinate system, shown in 
Fig. 1, by 6 and (j). This velocity can be decomposed into its x, y and z components as follows: 


Cx = ccos{0),Cy = csm{9) cos{<j)),Cz = csin(6l) sin((()). 


( 3 ) 


The rectangular room in Fig. 1 has dimensions defined by Lx, Ly, L^. For a sound ray traveling 
in this room, the number of collisions with the two walls incident to the x axis, nx, which will 
have occurred after time t, is solely dependent on the absolute value of the component of its 
velocity in the x direction, \cx\, and can be formulated as: 


n 


X — 



( 4 ) 


Dropping the floor operator [-J is an approximation due to the fractional component which is 
included in addition to the correct integer number of collisions. However, as time t increases, 
this fractional component represents a smaller and smaller percentage of the total number of 
collisions and can therefore be neglected. Analogously, the number of collisions of a sound ray 
with the walls incident to the y and z axes in a given time period t is given by Uy ~ , Uz — 

respectively. 

Now a reflection coefficient for each pair of parallel walls {Rx,Ry,Rz) is introduced, 
such that the initial power Aq of a sound ray is reduced to after a collision with one of 

the walls incident with the x axis. After nx collisions with walls incident to the x axis it can 
be seen that the power will be reduced to AqR^^ . After a given time period t the total power 
reduction due to collisions can then be closely approximated by: 

2t|c,c 2t|c^ 2tc| eoe(8) 2tesin(9)| cob(.<.)| 2tcBin(e)| ain(i^)| 

A{0,<p,t) = AoRx^- Ry'^^ Rz^^ =AoRx Ry Rz . (5) 


If it is desired to model the two walls incident to the x axis with differing reflection coefficients, 
this can be approximated by setting Rx equal to the geometric mean of the two. The resulting 
model error tends to zero as the number of reflections increases. 

Eq. 5 provides a model for the PSD contribution of a sound ray, with initial PSD Aq, 
traveling in the direction {0, (j)). Due to the initial diffuseness of the sound field, Aq is assumed 
to be the same for rays propagating in all directions. Now, in order to consider the contribution 
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of this ray to the CPSD measured by a sensor pair with spacing d and axis orientation (0niic; 
0mic), we define the phase term 

p{0 ( 6 ) 

representing the frequency-dependent phase difference between the sensor signals for a plane 
wave incident from the direction (6>, </>). Assuming lossless propagation, the CPSD of this ray 
will then be equal to the PSD of the ray rotated by this phase term. 

The CPSD and PSD of the sensor signals in the reverberant sound field can now be 
computed by surface integrals over a sphere to combine the contributions of rays incident from 
all directions. The spatial coherence of the decaying reverberant sound field at time t is obtained 
as: 

lo^ lo 4>)A{e, (j), t) sin(6>)d6ld(/) 
lo^ Jo sin(6»)d6»d(/) 

2tc| cob( 8) 2teBin(e)| cob(<<.) 2tcBin(e)| Bin(0)| 

_ /o lo Ry sin(0)d0d0 

2tc| cob( 6)) 2tcBin(l9)| cob( 0)| 2tcBin(l9)| Bin(j.) 

/o Ry smie)d9dcj) 

Closed-form solutions of these integrals can be obtained for special cases. Most notably the 
coherence of a spherically isotropic field is obtained for = Ry = Rz = ^ and arbitrary 
Pmic, and the coherence of a cylindrically isotropic field is obtained for R^ = Ry = 1, 

Rz = 0, ipraic = 0° and arbitrary The modeled coherence is independent of the source 

and microphone position, but it is dependent on the orientation of the microphone pair in the 
room. Microphone pairs aligned with the x axis (0mic = 0°), as shown in Fig. 1, result in the 
simplified phase term p{9,(j),k) = 

Note that the coherence generated from this model will always be real-valued, inde¬ 
pendently of the orientation of the sensor pair. This is due to the fact that walls are parallel, 
and therefore the imaginary part of the phase term of a ray arriving from the direction {9, </>) is 
always exactly cancelled by that of a symmetric ray from the opposite direction {tt — 9, (jj + tt), 
i.e., '^{p{9,(j),k)} + 9{p(7r — 9,n + </>, fc)} = 0, where A{-} denotes the imaginary part of a 
complex number. 

3. Evaluation 

First, we compare the proposed model to coherence estimates from impulse responses generated 
by the image source method (Allen and Berkley, 1979; Peterson, 1986; Habets, 2010). An 
environment of dimensions [L^, Ly, L^] = [6,4,3]m was simulated containing a linear array of 
16 microphones, with uniform spacing, d = 0.08 m, aligned with the x axis and centered in the 
room. The source was located at [4.8,3.2,2.6] m with the origin set at the bottom left hand 
back corner of the room. The reflection coefficients of the walls incident to the x axis were 
set to Rx = 0.8; for the other walls the coefficients were set to Ry = Rz = 1.0. An impulse 
response was generated from a source position to each microphone, with a sampling rate of 
16 kHz, and split into non-overlapping intervals 100 ms in length, in order to allow evaluation 
of the time-variant characteristics of the decaying sound field. In each interval the signal was 
short-time Fourier transformed using a window length of 1024, 75% overlap and a DFT of size 


F12(fc, t) — 


^i2(k,t) 


(7) 

( 8 ) 
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512. A single spatial coherence estimate for each time interval was then obtained by averaging 
the involved spectra over the entire interval and over all adjacent microphone pairs. This was 
done to reduce the variance of the coherence estimate, allowing the accuracy of the model to 
be more easily evaluated. The proposed coherence function (Eq. 8) was then computed using 
numerical integration in MATLAB. For each interval, the time point t for the computation of 
the coherence model was chosen 10 ms from the start of the interval, since the average coherence 
over the interval is dominated by the earlier contributions, when the held has more energy and 
therefore provides a larger contribution to the averaged coherence estimate. 

Fig. 2 and Fig. 3 show the coherence of the simulated room impulse response for two 
different orientations of the microphone array, in comparison to the ideal isotropic models 
and the proposed model. Note that the huctuations in the simulated coherence are due to 
the estimation variance. While the sound held is isotropic for the hrst interval, it becomes 
increasingly anisotropic as the sound held decays; this behavior is matched by the proposed 
model. Note that the model is also able to account for the effect of sensor orientation. A similar 
agreement between the presented model and simulated impulse responses was also obtained for 
other conhgurations of the microphone array and room parameters. 


(a) 0.0s to 0.1s (b) 0.1s to 0.2s (c) 0.2s to 0.3s 



simulated ■ sph. isotropic - - cyl. isotropic-proposed model 


Fig. 2. Comparison of coherence simnlated nsing the image source model and 
coherence calculated using the presented model (Eq. 8) for sensor orientation 
Smic = 0, <()mic = 0, withiu time intervals (a) 0.0s to 0.1s, (b) 0.1s to 0.2s, (c) 

0.2 s to 0.3 s. 

The coherence model was also compared to measured data in physical environments. 
This was done by measuring impulse responses in a room, from which the coherence was then 
estimated within non-overlapping 0.05 s time intervals, using the same techniques as for the 
simulated impulse responses, however only averaging over two microphone pairs. The room 
dimensions were 6m x 6m x 3m, with a reverberation time ^60 « 0.4 s, and the reflection coeffi¬ 
cients of the room surfaces were estimated to be = 0.60, Ry = 0.65, R^ = 0.80, using typical 
material properties. The presented coherence model was then used with these parameters to 
model the coherence of the decaying sound field in the room, and compared to the coherence es¬ 
timated from the impulse responses. Fig. 4 shows that the measured coherence initially roughly 
matches that of the spherically isotropic model as expected, but then diverges from that model. 
The proposed model matches the measured coherence well for all time intervals. 
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Fig. 3. Comparison of coherence simnlated nsing the image source model and 
coherence calculated using the presented model (Eq. 8) for sensor orientation 
Smic = 2O°,0mic = 36°, withiu time intervals (a) 0.0s to 0.1s, (b) 0.1s to 0.2s, 
(c) 0.2 s to 0.3 s. 


(a) 0.00s to 0.05s (b) 0.05s to 0.10s 



/[kHz] /[kHz] 


(c) 0.10s to 0.15s 



measured • sph. isotropic - - cyi. isotropic-proposed model 


Fig. 4. Comparison of coherence from measured room impulse responses and 
coherence calculated using the presented model (Eq. 8) within time intervals (a) 
0.0 s to 0.05 s, (b) 0.05 s to 0.1s, (c) 0.1s to 0.15 s. 


4. Conclusions 

A model for the temporal evolution of the spatial coherence between two sensors in a decaying 
reverberant sound field was developed, based on a ray-based sound propagation model in a 
rectangular room. Using room dimensions, surface reflection coefficients and orientation of the 
sensor pair as known parameters, the model can predict the time-varying characteristics of the 
coherence function, as confirmed by experiments using simulated and measured room impulse 
responses. The modeled coherence function is independent of the source and sensor positions 
in the room, and, due to the rectangular room geometry, always real-valued. For special cases, 
the proposed model simplifies to the ideal spherically and cylindrically isotropic sound field 
models. 

The presented model is based on a reverberant sound field generated by reflection and 
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attenuation in a rectangular room, where each surface is characterized by a single reflection 

coefficient. More complex geometries, non-uniform surface reflectivity factors, or diffracting 

objects obstructing the sound field may impact the applicability of this model. 
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